Large Deviations of the Maximum Eigenvalue in Wishart Random Matrices 
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We compute analytically the probability of large fluctuations to the left of the mean of the largest 
eigenvalue in the Wishart (Laguerre) ensemble of positive definite random matrices. We show that 
the probability that all the eigenvalues of a (N x N) Wishart matrix W = X T X (where X is a 
rectangular M xN matrix with independent Gaussian entries) are smaller than the mean value (A) = 

N/c decreases for large N as ~ exp | — ^A f2 $_ + 1; cj j , where (3 = 1, 2 corresponds respectively 

to real and complex Wishart matrices, c = N/M < 1 and $_(a;;c) is a rate (sometimes also called 
large deviation) function that we compute explicitly. The result for the Anti- Wishart case (M < N) 
simply follows by exchanging M and N. We also analytically determine the average spectral density 
of an ensemble of Wishart matrices whose eigenvalues are constrained to be smaller than a fixed 
barrier. Numerical simulations are in excellent agreement with the analytical predictions. 
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I. INTRODUCTION 

Consider a rectangular (M x N) matrix X whose el- 
ements Xij represent some data. The N entries of 
each of the M rows constitute the components of an N- 
dimensional vector Xi (with i = 1,2, ... , M). The vector 
Xi (the i-th row of the array) represents the i-th sam- 
ple of the data and the matrix element X^ represents 
the j-th component of the vector Xi. For example, sup- 
pose we are considering a population of M students in 
a class, and for each student we have the data of their 
heights, their marks in an examination, their weights etc. 
forming a vector of N elements (or traits) for each of the 
M students. Then the product W = X T X is a posi- 
tive definite symmetric (N x N) matrix that represents 
the covariance matrix of the data (unnormalized). This 
matrix characterizes the correlations between different 
traits. The spectral properties of this matrix, i.e., its 
eigenvectors and eigenvalues, play a very important role 
in the so called 'principal components analysis' (PCA) 
of multivariate data, a technique that is used regularly 
in detecting hidden patterns in data and also in image 
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processing [1, 2, 3], amongst other applications. In PCA, 
one diagonalizes the covariance matrix W and identifies 
all the eigenvalues and eigenvectors. The data are usu- 
ally maximally scattered in the direction of its princi- 
pal eigenvector, corresponding to the largest eigenvalue 
and are least scattered in the direction of the eigenvec- 
tor corresponding to the minimum eigenvalue. One can 
then prune the data by successively getting rid of the 
components (setting them to zero) along the eigenvec- 
tors corresponding to the smaller eigenvalues, but retain- 
ing the components along the larger eigenvalues, in par- 
ticular those corresponding to the maximal eigenvalue. 
This method thus reduces the effective dimension of the 
data. This technique is called 'dimensional reduction' 
and forms the basis of e.g, image compression in com- 
puter vision [3]. 

When the underlying data are random, e.g. the ele- 
ments of the matrix X are independent and identically 
distributed (i.i.d) random variables, real or complex, 
drawn from a Gaussian distribution, the product matri- 
ces W = X^X constitute the so called Wishart ensemble, 
named after Wishart who first introduced them [4]. In 
literature one can also find the term 'Laguerre' ensemble, 
because the Laguerre polynomials arise in the analytical 
treatment of its spectral properties. 

These Wishart random matrices have been extremely 
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useful in multivariate statistical data analysis [1, 5] men- 
tioned above (where W represents the covariance matrix) 
with applications in various fields ranging from meteoro- 
logical data [6] to finance [7, 8]. Such matrices are also 
useful to analyze the capacity of channels with multiple 
antennae and receivers [9]. They also appear in nuclear 
physics [10], quantum chromodynamics [11] and also in 
statistical physics such as in a class of (l + l)-dimensional 
directed polymer problems [12]. Recently Wishart ma- 
trices have also been used in the context of knowledge 
networks [13] and new mathematical results for the dis- 
tribution of the matrix elements for the Anti- Wishart 
matrices (when M < TV) have been obtained [14, 15]. 

Given that the joint distribution of the elements of 
the (M x TV) matrix X (real or complex) is a Gaus- 



sian, P[X] cx exp -f tr(XtX) 



where the Dyson index 



[3 = 1,2 corresponds respectively to real and complex 
matrices [16], the spectral properties of the Wishart ma- 
trix W = X'X have been studied extensively for many 
decades. For the case when M > N (the number of 
samples is larger than the dimension) it is known that all 
the eigenvalues are positive, a typical eigenvalue scales as 
A ~ TV for large TV, and the average density of eigenvalues 
in the large TV limit has a scaling form pat(A) w j^f (j?), 
where f(x) is the Marcenko-Pastur [17] function on the 
compact support x G [a;_,a;+]: 
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f(x) = - — v / {x+-x)(x-x^) 
2irx 



(1) 



with x± = (±±l) and c = TV/A/ (with c < 1). 

(This result was also rederived by a different method by 
Dyson [18] and the spectral fluctuations were numerically 
investigated by Bohigas et al. [19]). Thus, for c < 1, all 
the eigenvalues lie within a compact Marcenko-Pastur sea 
and the average eigenvalue, 
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For all c < 1, the distribution goes to zero at the edges 
X- and x+. For the case c = 1 [x— = and x+ = 4), 
the distribution diverges as x~ x l 2 at the origin, fix) = 
^ — x)/x for < x < 4 (shown schematically in Fig. 
1). For the Anti- Wishart case (M < TV, i.e., c > 1) where 
one has M positive eigenvalues (the rest of the (TV — 
M) eigenvalues are identically zero), the corresponding 
result can be obtained from the M > TV case simply by 
exchanging M and TV. 

Another important issue in the context of PCA is the 
distribution of the largest eigenvalue of a Wishart ran- 
dom matrix and a lot of recent work has been devoted 
to this question [5, 12, 20, 21, 22, 23]. From the exact 
analytical form of the density of states, it follows that 
the average of the maximum eigenvalue for large TV is 

(A raax ) ~ x + (c)N where x + (c) = + 1J . However, 
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FIG. 1: The dashed line shows schematically the Marcenko- 
Pastur form of the average density of states for c = 1. The 
average eigenvalue for c = 1 is (A) = TV. For c = 1, the 
largest eigenvalue is centered around its mean (A max ) = 4TV 
and fluctuates over a scale of width TV 1 / 3 . The probability 
of fluctuations on this scale is described by the Tracy- Widom 
distribution (shown schematically). 



for finite but large TV, the maximum eigenvalue fluctu- 
ates, around its mean x + (c)TV, from one sample to an- 
other. A natural question is: what is the full probability 
distribution of the largest eigenvalue A ma x? Recently, 
Johansson [12] and independently Johnstone [5] showed 
that for large TV these fluctuations typically occur over a 
scale ~ ©(TV 1 / 3 ) around the mean, i.e. the upper edge 
of the Marcenko-Pastur distribution, and the probability 
of typical fluctuations \ = TV _1 ' 3 [A max — x+(c)TV], prop- 
erly centered and scaled, is described by the well known 
Tracy- Widom distribution (see Section II for details). 

Note that the Tracy- Widom distribution describes the 
probability of typical and small fluctuations of A max over 
a narrow region of width ~ ©(TV 1 / 3 ) around the mean 
(A max ) ~ %+(c) TV- A question that is particularly impor- 
tant in the context of PCA is how to describe the prob- 
ability of atypical and large fluctuations of A max around 
its mean, say over a wider region of width ~ O(TV)? For 
example, what is the probability that all the eigenvalues 
of a Wishart random matrix are less than the average 
(A) w TV/c for large N? This is the same as the probability 
that A max < TV/c. Since (A max ) ~ .£+(<:) TV, this requires 
the computation of the probability of an extremely rare 
event characterizing a large deviation of ~ O(N) to the 
left of the mean (see e.g. a schematic picture for c = 1 
in Fig. 1). Questions of this kind have been recently 
addressed in ref. [24] on which we heavily rely, while for 
the general large deviations theory in connection with 
random matrices the reader is referred to [25]. 

In the context of PCA, this large deviation issue arises 
quite naturally because one is there interested in getting 
rid of redundant data by the 'dimension reduction' tech- 
nique and keeping only the principal part of the data in 
the direction of the eigenvector representing the maxi- 
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mum eigenvalue, as mentioned before. The 'dimension 
reduction' technique works efficiently only if the largest 
eigenvalue is much larger than the other eigenvalues. 
However, if the largest eigenvalue is comparable to the 
average eigenvalue (A), the PCA technique is not very 
useful. Thus, the knowledge of large negative fluctua- 
tions of A max from its mean (A max ) « x+(c) TV provides 
useful information about the efficiency of the PCA tech- 
nique. 

The main purpose of this paper is to provide exact 
analytical results for these large negative fluctuations of 
Amax from its mean value. Rigorous mathematical results 
about the asymptotics of the Airy-kernel determinant 
(i.e. the probability that the largest eigenvalue lies deep 
inside the Marccnko-Pastur sea) for the case c = 1 and 
/? = 2 have been recently obtained [26]. Here we follow 
the Coulomb gas approach [16] [27], which interprets the 
eigenvalues of a random matrix as a fluid of charged inter- 
acting particles, and use standard functional integration 
methods of statistical physics. This approach has been 
exploited in the context of the Laguerre ensemble for the 
first time by Chen and Manning [28], who performed a 
detailed asymptotic analysis of the level spacing for gen- 
eral (3 and a > — 1 (where a is essentially the prcfactor of 
the external logarithmic potential, see e.g. (25)) and de- 
termined the distribution of the two smallest eigenvalues 
in a certain double-scaling limit. These techniques have 
been also recently used to obtain analytically the large 
negative fluctuations of the maximum eigenvalue for the 
Gaussian ensembles [24]. Here we adopt this method for 
the Wishart ensemble. 

We show that for c < 1 , the probability of large fluctu- 
ations to the left of the mean (A max ) w x+(c) TV behaves, 
for large TV, as 



Prob [A max < t, TV] 



exp 



2 



x + (c)N -t 
TV ; 



(3) 



where t ~ O(N) < x+(c) TV is located deep inside the 
Marcenko-Pastur sea and <&_(x;c) is a certain left rate 
(sometimes also called large deviation) function with x 
being the main argument of the function and c being a 
parameter. In this paper, we compute the rate function 
$_ (x; c) explicitly. Knowing this function, it then follows 
that for large TV 



Prob [A max < (A) = N/c,N] ~ cxp(-0(c)TV 2 ), 
where the coefficient 



(4) 



(5) 



For example, for the case c = 1 (M = TV), we show that 
0(1) = /? (log2-g) =0.177522.../?. (6) 



The corresponding result for the Anti- Wishart matrices 
(M < TV) simply follows by exchanging M and TV. In 
this paper, we focus only on the left large deviations of 
Amax- The corresponding probability of large fluctuations 
of A max to the right of the mean (A max ) was previously 
computed explicitly by Johansson [12] (see the next sec- 
tion for details). 

As a byproduct of our analysis, we provide the gen- 
eral expression for the spectral density of a constrained 
Wishart ensemble of matrices whose eigenvalues are re- 
stricted to be smaller than a fixed barrier. 

The paper is organized as follows. In section II, we 
set up notations, we provide some mathematical prelim- 
inaries and we recall some known results for the Wishart 
random matrices as well as, for the sake of comparison, 
of Gaussian random matrices. In section III we outline 
the functional integration method followed by the steep- 
est descent calculation. In subsection III A, we derive 
the left rate function explicitly for the special case c = 1 
and in subsection III B we extend the results to the case 
c < 1. In section IV numerical simulations are compared 
to the analytical predictions. Section V concludes the 
paper with a summary and discussion, while the analyti- 
cal computation of the rate function for c < 1 is reported 
in the Appendix. 



II. WISHART AND GAUSSIAN RANDOM 
MATRICES: SOME PRELIMINARIES 



We consider a rectangular (M x TV) matrix X with M 
rows (representing M different samples) and TV columns 
(representing TV components of each sample). We assume 
that the entries of the matrix X are i.i.d random vari- 
ables each drawn independently from a standard normal 
distribution, such that the joint distribution of the ele- 
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where the 



mcnts is given by P[X] oc exp 
Dyson index /? = 1, 2 corresponds respectively to real and 
complex matrices [16]. One then constructs the Wishart 
matrix W = X'X by taking the product. The first nat- 
ural question is: Given the distribution of X, what is 
the joint distribution of the elements of Wl It turns out 
that this is not quite easy to compute. For the case when 
M > TV (when the number of samples is larger or equal 
to the dimension of the vector), this was computed by 
Wishart [4] . The corresponding calculation for the oppo- 
site 'Anti- Wishart' case, when M < N, turns out to be 
much more complicated. This was first obtained numer- 
ically [13] and only recently an analytical expression has 
been found [14, 15]. 

In contrast with the probability distribution of the 
matrix elements of W itself, the joint probability dis- 
tribution (jpd) of its eigenvalues was known since a long 
time [29], and from it all the interesting spectral prop- 
erties of the ensemble can be derived. Wc summarize 



4 



them here together with the corresponding ones for the 
Gaussian ensemble. 



A. Wishart (Anti-Wishart) ensemble 

For the case when M > N, all the TV eigenvalues are 
positive and their jpd is given by 

N 

P N (Xi,...,X N ) =K N e 2 2-*=i A « || A? x 



i<fe 



(7) 



where JsTjv is a normalization constant and the parameter 
(3 = 1,2 corresponds respectively to the real and complex 
X. On the other hand, for the Anti-Wishart case (M < 
TV), there are only M positive eigenvalues (the rest of the 
N — M eigenvalues are exactly 0) and their jpd is given 
exactly by the same formula as in (7) except that N and 
M are interchanged [15]. 

For the Wishart matrices with M > N, in the large 
N limit, the average density of states has the scaling 
form, pat(A) r* j^f (— ) where f[x) is the Marcenko- 
Pastur [17] function defined in (1). The corresponding 
result for the Anti-Wishart case (M < N) where one has 
M eigenvalues, simply follows by exchanging M and TV. 

For large TV the maximum eigenvalue fluctuates around 
its average (A max ) w x + (c)N and the typical fluctua- 
tion occurs over a scale of width 0(iV 1//3 ) around the 
mean. Johansson [12] and independently Johnstone [5] 
computed the limiting distribution of these typical fluc- 
tuations around the mean. They showed that for large 
N and for c < 1 [5, 12] 
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N 



1 



4/3 



N 1/3 X (8) 



where the random variable \ nas an A-independent lim- 
iting distribution Prob(x < x) = Fp(x), which is the well 
known Tracy- Widom distribution (see below). 



B. Gaussian ensemble 



In the case of a random (N x N) Gaussian matrix [27, 
30], the jpd of eigenvalues is given by: 



P/v(Ai, . . . , Xn) — Bn e 



j<k 



Afcl 



(9) 



where Bn normalizes the jpd and f3 = 1, 2 and 4 cor- 
respond respectively to the GOE (Gaussian orthogonal 
ensemble), GUE (Gaussian unitary ensemble) and GSE 
(Gaussian symplcctic ensemble). 

The average density of states in the large N limit 
has the scaling form pjv(A) ~ (^7) wn ere 



f sc (x) is the famous Wigner semi-circular law: f sc {x) 



— [2 — x 2 ] with compact support over x G [— V2, v2]- 
Furthermore, the analogous asymptotic form of A max 
is known to be [31] 
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where the random variable x nas again the limiting N- 
independent distribution, Prob[x < x] = Fp(x). 

In this paper, the main interest is focused on the 
largest eigenvalue. In summary, the scaled variables 
Amax/TV in the Wishart case and A max /viV in the Gaus- 
sian case, both typically fluctuate over a region of width 
- 0(A~ 2 / 3 ) around their mean and these typical fluctu- 
ations are described by the Tracy- Widom law Fp(x). 

The function Fp{x), computed as a solution of a non- 
linear Painlcvc differential equation [31], approaches to 1 
as x — > 00 and decays rapidly to zero as x — > —00. For 
example, for (3 = 2, F2(x) has the following tails [31], 

F2(x) — > 1 — Olexp[— 4.t 

3/2 /3] 

as x — > 00 

— > exp[-|x| 3 /12] as x — * -00. (11) 

The probability density function fp(x) = dFp/dx thus 
has highly asymmetric tails. 

It follows from (8) that in the Wishart case, the Tracy- 
Widom distribution describes the probability of typical 
and small fluctuations of A max over a narrow region of 
width ~ 0(N 1 ^ 3 ) around the mean (A max ) f» x+(c) N 

where x+(c) = (-^= + l) . 

As mentioned in the introduction, in this paper we 
are concerned not with the typical small fluctuations of 
0(N 1 ^ 3 ) around the mean, but rather with atypical large 
fluctuations of O(N). Thus, we are interested in comput- 
ing the probability of extremely rare events. In fact, the 
question about the large deviation of the largest eigen- 
value was addressed before in [12] and it was proved by 
rigorous methods that for c < 1 the probability of large 
fluctuations to the left of the mean (A max ) ~ x+(c)N, 
behaves for large N as in (3), but an explicit expression 
for the left rate function <I>_(x; c) was missing so far. On 
the other hand, for large fluctuations to the right of the 
mean (A max ) ks x+(c)N, 



1 - Prob [A max < t, N] 



exp 



A$ 1 



.(c) N _ 



N 



(12) 



for t ~ 0(N) > x + (c)N located outside the Marcenko- 
Pastur sea to its right and <&+(x; c) is the right rate func- 
tion that was obtained explicitly in [12]. 

The purpose of this paper is to provide an exact result 
for $_(x; c) for all c < 1. For c > 1 (Anti-Wishart) case, 
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the result holds with M and N interchanged. Let us 
summarize our main results. For the case c = 1, we give 
an explicit expression for the left rate function <I>_(x; 1) 
as stated in (36). Subsequently, the results in (5) and 
(6) follow. For c < 1, the function 4>_(x;c) has a rather 
long analytical expression which is derived in the Ap- 
pendix. However, the function can be easily evaluated 
using Mathematica® as illustrated in Fig. 5. 

These results should be compared to the corresponding 
ones for the Gaussian case. For the Gaussian ensemble, 
the left large deviations follow a similar law, namely 



Prob [An 



< t,N] 



exp 



2 fcGaiLss 



N 



(13) 



where t ~ 0(N 1 / 2 ) < V2A is located deep inside the 
Wigncr sea. For the Gaussian case, (A) = 0. Thus, 
the corresponding question about the probability that all 
eigenvalues are less than their average is the same as the 
probability that all eigenvalues are negative. This prob- 
ability plays a very important role in determining the 
average number of maxima of a random smooth poten- 
tial, where a stationary point is a local maximum if all the 
eigenvalues of the associated hessian matrix are negative. 
The calculation of this probability has been a subject of 
many theoretical and numerical studies with important 
applications in disordered systems, supercooled liquids, 
glassy models [32, 33] and more recently in anthropic 
principle based string theory [34, 35, 36]. Very recently, 
the left rate function $ < f £mss (y) has been computed ex- 
actly using functional integration methods [24]. Using 
this result, it was shown in [24] that for Gaussian matri- 
ces, 

Prob [A max < 0; N] ~ exp(-/3#A 2 ), (14) 
for large N where the coefficient 



6= -log 3 = 0.274653... (15) 

In this paper, we adapt the techniques used in [24] for 
Gaussian matrices to the Wishart case. Similar tech- 
niques have recently been used also in other problems 
such as in the calculation of the average number of sta- 
tionary points for a Gaussian random field with N com- 
ponents in the large N limit [37, 38]. 

Incidentally, let us remark that our problem might be 
tackled also from the completely different viewpoint of 
zero-dimensional replica field theories thanks to their re- 
cently discovered exact integrability [39] . This yet unex- 
plored route may provide an independent derivation of 
our results. 



III. FUNCTIONAL INTEGRATION AND THE 
METHOD OF STEEPEST DESCENT 

Our starting point is the joint distribution of eigenval- 
ues of the Wishart ensemble in (7). Let P/v(i) be the 
probability that the maximum eigenvalue A max is less 
than or equal to t. Clearly, this is also the probabil- 
ity that all the eigenvalues arc less than or equal to t and 
can be expressed as a ratio of two multiple integrals 



P N (t) = Prob[A max < t) 
/„*... JodXi. 



Zi(t) _ 
Zo 

.dX N exp(-|F[A]) 



. dAjv exp(— f F[A]) 



(16) 



where: 



TV 



iV 



i^[A]=^A i -(l+M-iV--)^logA i -^log|A i -A 



(17) 

Written in this form, F mimics the free energy of a 2- 
d Coulomb gas of interacting particles confined to the 
positive half-line (A > 0) and subject to an external lin- 
ear+logarithmic potential, as mentioned in the introduc- 
tion. The denominator in (16), which is simply a nor- 
malization constant, represents the partition function of 
a free or 'unconstrained' Coulomb gas over A € [0, oo). 
The numerator, on the other hand, represents the par- 
tition function of the same Coulomb gas, but with the 
additional constraint that the gas is confined inside the 
box A € [0,t], i.e., there is an additional wall or infinite 
barrier at the position A = t. We will refer to the numer- 
ator as the partition function of a 'constrained' Coulomb 
gas. The constraint should not be effective when t < X- 
or t > x_|_. 

Note that in the Gaussian case, the external potential 
is harmonic over the whole real line (V(A) = A 2 /2), while 
in the Wishart case, V(X) = oo for A < (infinite barrier 
at A = 0) and V(X) = X-(l+M-N - 2/(3) log A 
for A > representing a linear+logarithmic potential. 
By comparing the external potential and the logarithmic 
interaction term, it is easy to see that while for Gaussian 
ensembles a typical eigenvalue scales as A ~ \J~N for large 
N, for the Wishart case it scales as A ~ N. 

After defining the constrained charge density: 



1 N 

(A) := g N (X; t) = - ]T 5(X - X t )0(t - A) (18) 



i=l 



and taking into account the following trivial identity for 
a generic function h(x): 



N 

J2KXi)=N J dXg N (X)h(X) 



(19) 
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we may express, for large N, the partition function Z\{t) 
in (16) as a functional integral [24]: 



Z\(t) oc J T>[qn] cxp 



2 



N / gAr(A)AdA 



N{M - N + 1-2/0) l g N {X)\ogXdX 



o 

N ' 2 [ [ Qn(X)qn{X') log I A — A'|dAdA' 
Jo Jo 

JV / &v(A)log[£iv(A)]dA] } 
Jo 



(20) 



where the last entropic term is of order O(N) and arises 
from the change of variables in going from an ordi- 
nary multiple integral to a functional integral, [{A»}] — > 
[giv (A)]. The constrained charge density qn(X) satis- 
fies the obvious constraints qn(X) = for A > t and 
/ Q N (X)dX = 1. 

Since we are interested in fluctuations of ~ O(N), it is 
convenient to work with the rescaled variables A = xN 
and £ = t/N. It is also reasonable to assume that for 
large N, the charge density scales accordingly as qn(X) — 
N^fiX/N), so that f(x) = for x > ( and J Q C f(x)dx = 
1. 

In terms of the rescaled variables, the energy term in 
(20) becomes proportional to N 2 while the entropy term 
(~ O(N)) is subdominant in the large N limit. Eventu- 
ally we can write: 

Z x (0 cx f 2?[/]exp U|jV 2 S[/>);C] +0(JV)) (21) 



where: 



s if( x )'X} = [ xf{x)dx-a [ f(x)log(x)dx+ 
Jo Jo 

f{x)f{x') log \x - x'\dxdx'+ 
f(x)dx — 1 

/o 



Ci 



(22) 



where we have introduced the parameter a = for 
later convenience. In (22), C\ is a Lagrange multiplier 
enforcing the normalization of /. 

For large N we can evaluate the leading contribution to 
the action (22) by the method of steepest descent. This 
gives: 



Zi(Q oc exp 



l N 2 S[f*(x);{]+0(N) 



(23) 



where /* is the solution of the stationarity condition: 
5S[f(x);( 



Sf(x) 







(24) 



This gives for < x < £: 

x — a log a; + C\ = 2 / f(x') log \x — x'\dx' 
Jo 

Differentiating (25) once with respect to x gives: 

nC /V) 



o 
2a- 



-dx' 



< x < ( 



(25) 



(26) 



where V denotes the Cauchy principal part. 

Finding a solution to the integral equation (26) is the 
main technical task. The next two subsections are de- 
voted to the solution of (26), first for the special case 
c = 1 and then for < c < 1. We remark that the so- 
lution of (26) gives the average density of eigenvalues in 
the limit of large N for an ensemble of Wishart matrices 
whose rescaled eigenvalues are restricted to be smaller 
than the barrier £. We will refer to f(x) as the con- 
strained spectral density. 

Before proceeding to the technical points, it may be 
informative to first summarize the results for the con- 
strained spectral density fix) in the general < c < 1 
case. The most general form is: 



/(*) 



1 y/x-Li(c,C) 

2tt 



A(c,() 



(27) 



where L\ is the lower edge of the spectrum and A is re- 
lated to the mutual position of the barrier with respect 
to the lower edge. In the following table, we schemati- 
cally anticipate the values for L\ and A in the different 
regions of the (c, £) plane: 





c = 1 


< c < 1 


< C, < x+ 
(barrier effective) 


Li = (32) 


L\: see (50) 


A=i±± (32) 


A = > C (44) 


C,>x+ 
(barrier ineffective) 


Li = 


Li = x- 


A = C = 4 


A = C = x+ 



TABLE I: Values of L\ and A in the different regions of the 
(c, Q plane. 



The support of / is: 

Li(c J C)<a?<min[C J A(c,C)] 



(28) 



At the lower edge of the support Li(c, £), the density 
vanishes unless c = 1, in which case it diverges as ~ 

At the upper edge of the support, according to the 
value of the minimum (( or A(c,()) hi (28) the density 
can respectively diverge as ~ 1/VC ~ x or vanish. 

Note that the unconstrained Marcenko-Pastur law (1) 
is recovered from (27) when the barrier is ineffective, i.e. 
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A. The c = 1 case 

In this case, the support of the unconstrained spectral 
density is (0, 4] and the Marcenko-Pastur law prescribes 
an inverse square root divergence at x = . Furthermore, 
the density vanishes at x — 4 (see Fig. (1)). 

In the constrained case, the barrier at £ is only effective 
when < ( < 4. When the barrier crosses the point 
C = 4 from below, the density shifts back again to the 
unconstrained case. 

The integral equation for f(x) (26) becomes: 



1 ^ f C 
2 =V l 



dx' 



0<x<( 



x — X 

The general solution of equations of the type: 



7 dx' = g(x) 



/q X — X 

is given by Tricomi's theorem [40]: 



(29) 



(30) 



i 



n 2 y/x(C-x) 



U! — X 



duo + B 



(31) 

where B is an arbitrary constant. After putting g(uS) = 
1/2 in (31) and determining B by the normalization con- 
dition J Q f(x)dx — 1 we finally get: 



2tt^ x(C — x) 



c 



+ 2 - x 



< x < C (32) 



A plot of this charge density for two values of the barrier 
C is given in Fig. 2. In summary, the average density of 




0.5 



1 . 5 



FIG. 2: Constrained spectral density f(x) for the barrier at 
C = 1 and C = 2. 



states with a barrier at £ is given by: 



2-x 




< C < 4 
C>4 



(33) 



Thus, for all £ > 4, the solution sticks to the £ = 4 case. 
Note that both cases in (33) can be obtained from the 
general formula (27). 

Now we can substitute (33) back into (25) to find the 
value of the multiplier C\ and eventually evaluate the 
action S[f*(x);(] (22) explicitly for < C < 4: 



5(C) := S[f*(x);C] = 21og2 



C C 2 
logC + - - — 
s ^ 2 32 



(34) 



From (23), we get Z X (C) ~ cxp(-/3A 2 5(C)/2). For 
the denominator, Zq = Z\(Q = oo) = Z\(C, = 4) w 
cxp(— (3N 2 S(4)/2), where we have used the fact that the 
solution for any £ > 4 (e.g., when £ = oo) is the same as 
the solution for £ = 4. Thus, eventually the probability 
P/v(i) (16) decays for large TV as: 



fV(t) 



gift) 
exp 



cxp 



2 



/3 



iV"[S(C)-5(4)] 
t 



N 



1 



where the rate function is given by 

'21og2-Iog(4-a?) - | - f| a;>0 



$_(a:;l) = 







x < 



(35) 



(36) 



and is plotted in fig. 3 




12 3 
FIG. 3: Rate function $-(x; 1) (see (36)). 

We now turn to the original problem of determining 
the probability of the following extremely rare event, i.e. 
that all the eigenvalues happen to lie below the mean 
value (A) = f~ XpN^dX = N. Starting from (35), this 
is easily computed by putting the barrier at the mean 
value t = N, i.e., £ = 1. We then get for large N 



Prob [A max < (A) =N,N}~ exp[-0(l)7V 2 



(37) 



where 



9(1) = |*_(3;1) 



(38) 
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Since we are calculating here the probability of neg- 
ative fluctuations of A max of O(N) to the left of the 
mean (A max ) = x + (c)N, when the argument of the left 
rate function $_(x;l) is very small (i.e., for fluctua- 
tions -C O(N)), (35) should smoothly match the left 
tail of the Tracy- Widom distribution that describes fluc- 
tuations of order ~ 0(N 1 ^ 3 ) to the left of the mean 
(A m ax) = x + (c)N. Indeed, from (36) as x — ► 



(39) 



and substituting (39) in (35) we get, for fluctuations <C 
O(N) to the left of the mean, 



P N (t) ~ exp 

= exp[-| X | 3 /12] 







— N 2 U-t/N) 3 
384 v ' ; 



(40) 



where % = (* - 4A^)/(2 4 / 3 iV 1 / 3 ). This coincides with 
Johansson's result for the Tracy- Widom fluctuations in 
(8) for c — 1 and comparing (40) and (11), we see that 
we recover the left tail of the Tracy- Widom distribution. 



B. The c < 1 case 

Our approach is very similar to the previous case. 
However, some additional technical subtleties, which we 
emphasize, arise in this case. 

As in the unconstrained case, we expect a lower bound 
Li = Li(c,Q) to the support of the constrained f(x). 
The parameter L\ will be determined later through the 
normalization condition for f(x). 

It is convenient to reformulate (26) in terms of the new 
variable y = x — Li, measuring the distance with respect 
to the lower edge of the support, where f(x) is assumed 
to vanish. 

Equation (26) then reads: 



1 



2(y + Li) 



= V 



m 

o y-y' 



dx' 



0<y <L (41) 



where we have denoted L = ( — L\ and f(y) = f(y + L\) 
The general solution of (41) in this case is: 



f(y) 



i 



K\Jy(L - y) 



a y/L^L + Li) 
2 y + L 1 



+ B' 



(42) 



and the constant B' is determined by the condition f(y 
0) = 0. Thus we get: 



Kv) 



Vv 



A-Li-y 



where: 



2ir\/L - y [ y + Li 
A = A(c,C)=ay/C/T 1 



(43) 



(44) 



Note that everything is expressed in terms of the only 
still unknown parameter L\. 

From (43) we can infer two kinds of possible behaviors 
for f(y) due to the competing effects of the singularity 
for y — > L (where the denominator vanishes) and the sup- 
pression for y —> A — L\ (where the numerator vanishes). 

Thus, depending on which of the following two condi- 
tions applies once we have put the barrier at £: 



A(c, C) - L X (c, C) > L(c, VLi(cXK < ol (I) 

A(c, C) - L\ (c, C) < L(c, a/£i(c,C)C > ol (II) 

(45) 

/ can diverge at y = L or vanish at A — L\ respectively. 
In (45) we have restored the functional dependence for 
clarity. 

This is a subtle point because, given the barrier at 
C, we cannot determine a priori which of the previous 
conditions holds. In fact, £i(c, £) should be determined a 
posteriori separately for each case from the normalization 
condition: 



f(y)dy = l 



(46) 



Once this is done, it turns out that the conditions (45) 
can be reformulated in terms of the position of the barrier 
C in the following much simpler way: 



o<C<z+ (i) 

C>x+ (II) 



(47) 



We summarize here the final results in the two cases. 

Case I. < ( < x + 

The normalization condition (46) leads to the following 
cubic equation for w = w(c, () = y/ L\(c, £): 

w 3 - [2(2 + a) + (]w + 2ay/( = (48) 

which has always three real solutions, one negative (wq) 
and two positive: 



, v\ 2 P fe + 2kir 

Wk(c,Q = - , <~ cos 



0,1,2 (49) 



where: 



P 

q 

B 



-[2(2 + a) + C] 
2aVC 

arctan 
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Note that W2 < w\. With simple considerations, we con- 
clude that the right root to be chosen is u^c, Q. Thus: 



L l {c,Q = wl{c,C) 



(50) 



Finally, we can write down the full constrained un- 
shiftcd spectral density as: 



/(*) 



1 y/x-L^X) 

2n 



A(c, C) - x 



(51) 



valid for Li(c, () < x < ( where L±(c, C) is given by (50) 
and A(c,C) by (44). 

A plot of /(x) for c = 0.1 and £ = 14 is given in fig. 4. 
In this case, Li(c, C) ~ 4.60084 and A(c, () w 15.6996. 




10 12 14 



FIG. 4: Constrained spectral density f(x) for c = 0.1 and 
C = 14. 



Case II. £ > x+ 

In this case, the barrier is immaterial and we should 
recover the unconstrained Marcenko-Pastur distribution. 
The support of f(y) is [0, A — Li] and this implies that 
we can safely put L = A — L\ in (46). 

The integration can be performed and coming back to 
the unshifted spectral density f(x) we get: 



/» = 



1 \Jx — Li^/L 2 — ■ 



2tt x 
valid for L\ < x < L2 where: 




(52) 



(53) 



which is the unconstrained Marcenko-Pastur distribu- 
tion, as it should. 



It is interesting to evaluate the limit c — > 1 in 
(51) and (52) in order to recover the result (33) in 



subsection III A. The case of equation (52) is obvious. 
For the other, it is a matter of simple algebra to show 
that: 



lim Li(c,C) = 

c— >1~ 

lim A(c,C) = (C + 4)/2 

e->l- 



(54) 
(55) 



so that (51) matches (33). 

Furthermore, Cases I and II should match smoothly 
as C hits precisely the limiting value x+. This corre- 
sponds to A(c, Q = C — > A(c, x+) = x+- It is again 
straightforward to check that this last condition implies 
Li(c,x + ) = x- so that (51) recovers (52). 

The interesting case for computing large fluctuations 
is Case I. One can insert (51) into (22) in order to eval- 
uate (23). It turns out that the integrals involved can 
be analytically solved in terms of derivatives of hyperge- 
ometric functions, but a more explicit formula is derived 
in the Appendix. We give here a plot of the rate function 
$_(x; c) that describes the large fluctuations of O(N) to 



the left of the mean (A r 



P N (t) 



Zi{t) 



exp 



exp 



.(c)N: 



N 2 {S(C)~S(x + )} 



N 



(56) 



The plot is given in Fig. 5 for several values of c ap- 
proaching 1. The limiting case <&_(x; 1) (36) is also plot- 
ted. 



S_ (x; c) 
5r 




/ / 



8 10 



FIG. 5: Rate function $_(x;c) for the following values (from 
left to right) of c = 1, 0.8, 0.6, 0.4, 0.2. See also Figure 3. 



We can now compute to leading order the probability 
that all the eigenvalues are less than the mean value (A) = 
N/c. This amounts to putting the barrier at t — N/c 

in (56), which gives <f>_ (^-^ + l\cj. Several numerical 

values are given in the following table. 
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c 


*-(£ + !;«) 


0.1 


0.475802 


0.2 


0.449162 


0.4 


0.414592 


0.6 


0.390245 


0.8 


0.37104 


0.95 


0.358805 


1 


0.355044 



TABLE II: Some values of the rate function (see text for fur- 
ther explanation). 



IV. NUMERICAL RESULTS 

The formulas (33), (35), (51) and (56) have been nu- 
merically checked on samples of hcrmitian matrices (/3 = 
2) up to N = 30, M = 300 and the agreement with the 
analytical results is already excellent. We describe in this 
section the numerical methods and results. 

A direct sampling of Wishart matrices up to those sizes 
is computationally very demanding. We applied the fol- 
lowing much faster technique, suggested in [22]. 

Let Lp — BpBp be the tridiagonal matrix correspond- 
ing to: 



B* 



X2a 

X/3(N-1) X2a-P 



\ 



(57) 



X0 X2a-P(N-l)J 



Bp is a square N x N matrix with nonzero entries on 
the diagonal and subdiagonal and a = (/3/2)M. The 
nonzero entries Xk are independent random variables ob- 
tained from the square root of a x 2 -distributed variable 
with k degrees of freedom. It has been proved in [22] that 
Lfj has the same joint probability distribution of eigen- 
values as (7). Thus, as far as we are interested in eigen- 
value properties, we can use the L@ ensemble instead of 
the original Wishart one. This makes the diagonaliza- 
tion process much faster due to the tridiagonal structure 
of the matrices Lp. 

We report the following four plots: the first two (fig. 
6 and 7) are for the case c = 1 and the last two (fig. 8 
and 9) for the case c = 0.1. 

In fig. 6, we plot the histogram of normalized eigen- 
values X/2N for an initial sample of 3 x 10 5 hermitian 
matrices (f3 = 2, N = M = 30), such that matrices with 
Amax/2Af > ( are discarded. The barrier is located at 
C = 3. On top of it we plot the theoretical distribution 
(33). 

In fig. 8, we do the same but in the case N = 10, M = 
100. The barrier is located at £ = 14. The theoretical 
distribution is now taken from (51). 

To obtain the plots in fig. 7 and 9, we generate f» 



5 x 10 5 L2 matrices for different values of A" = 7 — > 30 
(or 15). The parameters (c, £) are kept fixed to the value 
(1,3) for fig. 7 (x+ = 4) and (0.1,14) for fig. 9 (x+ ~ 
17.32). The constraining capability of those barriers can 
be estimated by the ratio re(c, £) = (x+ — C)/(%+ — %-), 
corresponding to the window of forbidden values for the 
largest eigenvalue. We get re(l, 3) = 0.25 and k(0.1, 14) w 
0.26, to be compared with the values of k(c, £) = (2 + 
v / c)/4 for the barrier at the mean value C = 1/c, which 
would give respectively K = 0.75 and k w 0.58. This 
relative mildness of the constraint allows us to get a much 
more reliable and faster statistics in the simulations. 

For each value of N, we determine the empirical fre- 
quency r(N) of constrained matrices as the ratio between 
the number of matrices whose largest rescaled eigenvalue 
is less than £ and the total number of samples (5 x 10 5 ). 
The logarithm of r(N) vs. the size N is then naturally 
fitted by a parabola aN 2 + bN + c to test the prediction 
for a in formulas (35) and (56). 

The best values for the coefficient a of the leading 
term are estimated as —0.006153 (c = 1) and —0.0357 
(c = 0.1), to be compared respectively with the theo- 
retical prediction $_(1;1) w -0.006432 and &-(x+ - 
14; 0.1) w —0.03666. Despite the relatively small sizes 
and the O(N) corrections, the agreement is already good. 




FIG. 6: Constrained spectral density Qn(X) for N = M = 30. 
The barrier is at C, = 3. In dotted green the histogram of 
rescaled eigenvalues over an initial sample of 3 x 10 5 matrices 
(/? = 2). In triangled red the theoretical distribution. 



V. CONCLUSIONS 

In this paper we have studied the probability of atyp- 
ically large negative fluctuations (with respect to the 
mean) of the largest eigenvalue A max of a random Wishart 
matrix. The standard Coulomb gas analogy for the joint 
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5 10 15 20 25 30 



N 

FIG. 7: Natural logarithm of the probability that all the 
rescaled eigenvalues are less than ( — 3 vs. N for the case 
c = 1 (x+ = 4). The data points are fitted with a parabola 
(solid line). 
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2 4 6 8 10 12 14 

V2N 

FIG. 8: Constrained spectral density Qn(X) for N = 10, M = 
100 (c = 0.1). The barrier is at ( — 14. In dash-dotted green 
the histogram of rescaled eigenvalues over an initial sample 
of 5 x 10° matrices (/3 = 2). In triangled red the theoretical 
distribution. 
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N 

FIG. 9: Natural logarithm of the probability that all the 
rescaled eigenvalues are less than £ = 14 vs. N for the case 
c = 0.1 (x+ ~ 17.32). The data points are fitted with a 
parabola (solid line). 

tivariate statistical analysis of data. Our main result is 
to show that, to leading order in N, this probability de- 
cays as ~ cxp[— |A^ 2 $_(--^ + l;c)], where $_(a;;c) is 
a rate function that we have explicitly computed. The 
quadratic, instead of linear, iV-dependence of the expo- 
nential reflects the eigenvalue correlations. 

Furthermore, our method allows us to determine ex- 
actly the functional form of the constrained spectral den- 
sity, i.e., the average charge density of a Coulomb gas 
constrained to be within a finite box A G [0, t]. 

All the analytical results are in excellent agreement 
with the numerical simulations on samples of hermitian 
matrices up to N = 30, and the estimates of the large 
deviation prefactor are already good even for N ~ 15. 

One of us (PV) has been supported by a Marie Curie 
Early Stage Training Fellowship (NET- ACE project). 
We are grateful to Gemot Akemann, Igor Krasovsky 
and Yang Chen for helpful comments and for pointing 
to us relevant references. The support by Sergio Consoli 
(Brunei) for the numerical simulations is also gratefully 
acknowledged. 



probability distribution of eigenvalues allowed us to use 
the tools of statistical physics, such as the functional in- 
tegral method evaluated for large N by the method of 
steepest descent. Using these tools, we have analytically 
computed the probability of large deviations of A max to 
the left of its mean. In particular, our main motivation 
was to compute the probability of a rare event: all eigen- 
values are less than the average (A) = N/c. This implies 
that the largest eigenvalue itself is less than (A) = N/c. 
This question is relevant in estimating the efficiency of 
the 'principal components analysis' method used in mul- 



APPENDIX: RATE FUNCTION FOR c < 1 

We evaluate in closed form the action S(() : = 
S[f*(x); C] (see (22)) for the case c < 1, where f*(x) 
is given by (51). The result is in eq. (A. 3). 

The rate function <!>_(a;; c) for c < 1, given by: 

$_(x;c) = S(x + -x) - S(x + ) (A.l) 
can be evaluated immediately. 
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After inserting (51) into (22) and determining C\ from 
(25), we find that S(() is given by: 



s(0 



c 



i J f(x)xdx - ^ J f(x) \og{x)dx+ 
C f{x) \og(x - L x )dx + h - £ logiL,) (A.2) 



After the substitution x = (£ — L\)t + L\ in the integrals 
in (A.2) and some simple algebra, S(Q can be expressed 
as: 

5(C) = -fe 1 -e 2 + ^^is + ^-|iog(L 1 ) (A.3) 

where 0^. and S are the following functions of c and (: 



^ {log(C 



Li) 



A 



-To 



t-Li "\C-L 



Li 



(-Li 
„ A 3 



I* 



Li 



} 



4 16 ^ 16 2tt 3 \C-L! 



1 



- log(2) 



(A.4) 



The functions 2& (x) are given by the following integrals 
d 



dx 



Zi(aO 
X 2 (x) 

^ (re) 



log(i + x) t 



dt 

o i + x 
l 



1 -t 



logi 



t 



f + x V 1 — i 

l 

c?t log(i + X 





1 -i 



(A.5) 
(A.6) 
(A.7) 
(A.8) 



which, following very closely rcf. [28] paper I, appendix 
B, can be computed explicitly in closed form. 

The integral I$(x) (and thus also To (a;)) can be com- 
puted by Mathematica®: 



7T 

2 

log 



l + 2x- 2 v / a;(l + x) + 2 log 







1 


1 + 










x 



(A.9) 



while X\ (x) and I 2 (a;) are given in terms of derivatives of 
hypcrgcomctric functions. More explicit expressions can 
be given as follows, starting with T\{x). Exploiting the 
identity h x \ogh = d\h x , we can rewrite the integral as: 



d x I dt(t + xY 




t 



1 -t 



x=-i 



(A.10) 



and the integral in (A. 10) can be evaluated in terms of 
Rummer's hypergcomctric function: 



-A,|;2;-- 
2 x 



(A.ll) 



Now, applying the transformation formulas [41] [15.3.7 
pag. 559] and evaluating the derivatives of Gamma func- 
tions that arise, we finally get: 



21og4 + 2 ii(x) - 2 
2^/x i 2 {x) ] 



x I Ax 
t~t — log — 

l + x \ e z 



where: 

ii(x) = [dp 2 -Fi(i - Mi -m; "A* + V 2 ; -z)] 



(A.12) 
(A.13) 



i 2 (x) = d„ {l+xf- 1 ' 2 2 F 1 { f i, + 3/2; -x) 



(A.14) 



To evaluate i±(x) and i 2 (x), we use the series expansion 
for hypergeometric functions [41] [15.1.1 pag. 556] and 
upon differentiation we get: 

h(x) = -f^B (^,ny-x) n (A.15) 

where B(v,w) = r (^) r (" J ) j s Euler's Beta function. Intro- 
ducing the integral representation of the Beta function: 



B(x,y) 



dt e^ii-ty- 1 



(A.16) 



into (A.15) and upon exchanging summation and inte- 
gral, we arrive with the help of Y^Lo(~ x t) n = (1 + xt) -1 
to: 



i\{x) = x 



dt 



l vT-*(l + xt) \l l + x 
Following the same procedure, we get for i 2 (x) 



arcsinh(v / x) 
(A.17) 



1 



[log(l + x)- ii(a 



s/l + x 

where i\{x) is defined in ref. [28] as: 



l + x 

i\(x) = —2 + 2\ arctanh 

x \ V 1 + x 

From (A.12) we get the final result for Ti(x) 



(A.18) 



(A.19) 



Ii{x) = 7r < — log 4 + ./ — - — [2arcsinh(-\/x)- 



+ 2\jl-\ — arctanh 

x 



l + x 



log[4z(l + z)] ]} 
(A.20) 



Following the very same procedure as in the previous 
case, we find for X 2 (x): 



J 2 (x) = 7T 



- log 4 



— (2arcsinh(v / x) — log(a;)) 
(A.21) 
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Now we compute the limit c — > 1~ in (A. 3) to recover 
(34). Given that, for c — > Li — > 0, a — > and 
A — > (C + 4)/2, we have to evaluate the integrals Xfc(x) 
for ie — ► 0. This gives: 



2b (0) 
li(0) 
J 2 (0) 

2 3 (0) 



7T 

— 7rlog4 
— 7rlog4 

-|(log4-l) 



Then, S[f*(x);{] 



-03 



2" 



(A.22) 
(A.23) 
(A.24) 

(A.25) 
2 log 2 - 



logC 



:S2 



as it should (see (34)). 
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